Here are the packages we will be using:
library(statnet)
library(igraph)
library(intergraph)
{igraph} and {statnet} are two of the most commonly used packages for social network analyses. Both include built-in functions that can plot networks and answer almost every beginner-level question about a given network. However, there may be some functions that are present in one package but not the other (or just work better). So, {intergraph} is another useful package that helps users transition between {igraph} and {statnet}.
When analyzing social network data, we want to know to what degree our variables impact the formation of a particular network organization. Our independent variables or predictor variables are node level variables (e.g. age) or similarity between node level variables (e.g. whether pairs belong to the same age category). Our dependent variable is the organization of edges in the network.
We cannot use standard OLS regressions to analyze pair characteristics and the presence of edges between pairs because it violates the assumption that observations are independent. For example, observations A-B, A-C, and A-D are not independent because they all involve individual A. Or if one bonobo is GG rubbing with 5 other bonobos, those 5 other bonobos may also be GG rubbing with each other. Quadratic assignment procedure (QAP) addresses this issue of non-independence.
QAP is a resampling-based method that controls for non-independence
in network structure via random permutations. These permutations use
different arrangements of the rows and columns in the adjacency matrix.
Thus, the network structure is maintained but the arrangement of
individuals in the structure is randomized. This represents the null
hypothesis because it should eliminate any potential correlations
between ties and independent values.
There are 3 types of QAP regressions we can run with {statnet}: correlation regressions, multiple/linear regressions, and logistic regressions. We will describe and demonstrate each regression using the bonobo data. We’ll also do the same with a more complex dataset from spider monkeys!
Correlation QAPs simply test whether two social networks are correlated. For example, using our bonobo datasets, we can ask: Do bonobos who groom each other correlate with bonobos who GG rub with one another?
What is our null hypothesis and alternative hypothesis in this question?
We can start by using the gcor() function to get a correlation coefficient between the two networks.
gcor(bonobo_ggr_net, bonobo_groom_net)
## [1] 0.1111054
What does this correlation coefficient signify?
These networks don’t seem very correlated but let’s test for significance anyways. The qaptest() function uses Monte Carlo simulations of likelihood quantiles to test arbitrary graph-level statistics against a QAP null hypothesis.
qap_cor <- qaptest(list(bonobo_ggr_net, bonobo_groom_net), # include both network objects in a list
gcor, # the function you're using is correlation between networks (gcor)
g1=1, # use first graph in the list
g2=2, # use second graph in the list
reps = 1000) # number of permutations to run
summary(qap_cor)
##
## QAP Test Results
##
## Estimated p-values:
## p(f(perm) >= f(d)): 0.356
## p(f(perm) <= f(d)): 0.797
##
## Test Diagnostics:
## Test Value (f(d)): 0.1111054
## Replications: 1000
## Distribution Summary:
## Min: -0.4191705
## 1stQ: -0.1010049
## Med: 0.005050247
## Mean: 0.006853186
## 3rdQ: 0.1111054
## Max: 0.8534918
Under “Test Diagnostics”, the test value represents the correlation coefficient between grooming and GG rubbing networks. This is interpreted at the level of the dyad, pairs who groom each other are likely not related to pairs who GG rub with one another. We previously calculated this using the gcor() function.
Under “Estimated p-values”, we find statistics that tell us about how this observed correlation compares to what the correlation looks like when we randomly permute one of our networks. Note that these are both one-tailed p-values! We can see that our p-values are greater than 0.05, meaning we have significant evidence to fail to reject our null hypothesis. There is no significant correlation between grooming and GG rubbing networks.
We can also visualize the “Distribution Summary” using the plot() function.
plot(qap_cor)
The dotted line gives us the correlation among the observed networks. The curve shows the distribution of correlations across permuted networks. Our plot shows that across 1000 replications, very rarely do we see any correlation between grooming and GG rubbing networks in this bonobo population. The correlation tends to center around 0.
Linear regression QAPs test the extent to which predictor variables are affecting edge weights. Recall that our predictor variables are node attributes.
For example, if our predictor variable is rank, we can ask: Are high-ranking or low-ranking individuals more tied through GG rubbing? In other words, we are asking: Is a bonobo more likely to have more weighted edges for each unit increase in rank?
Predictor variables must be matrices in order to be used in the QAP functions. We can look at whether rank affects the weight of ties sent or received (or both, but they must be added to the model separately).
First, we must create a matrix that shows the rank of senders. Note that senders are represented by ROWS in the matrices.
nodes <- 7 # number of nodes in the data set
rank <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we are interested in
rank_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) # create empty matrix to be filled for senders
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_sending[i,] <- rep(rank[i], nodes)} # the rank of each actor is repeated over entire ROW of matrix
rank_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
Next, we must create a matrix that shows the rank of receivers. Note that receivers are represented by COLUMNS in the matrices.
rank_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes) # empty matrix for receivers
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
rank_receiving[,i] <- rep(rank[i], nodes)} # the rank of each receiver is repeated over entire COLUMN of matrix
rank_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our linear regression! First we are asking: Does rank affect how many times bonobos G-G rubbed others (the weight of the edges they sent)?
lrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_sending)) # list of predictor variables as network or matrix objects
summary(lrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4534205 -0.3093578 -0.2036344 0.5465795 0.8695587
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.8507548 0.994 0.006 0.012
## x1 -0.1161796 0.015 0.985 0.029
##
## Residual standard error: 0.4587 on 40 degrees of freedom
## Multiple R-squared: 0.06227 Adjusted R-squared: 0.03883
## F-statistic: 2.656 on 1 and 40 degrees of freedom, p-value: 0.111
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.70602 -1.88290
## 1stQ -0.63821 -0.48739
## Median 0.06488 0.06842
## Mean 0.04012 0.05231
## 3rdQ 0.75416 0.54608
## Max 2.70979 1.87439
x1 is gg rubbing, estimate = -0.1161796 is beta 1, intercept = 0.8507548 is beta 0 when an individual’s rank is 0, the predicted weight of a tie for gg rubbing is -0.1 - this makes no sense but it doesn’t matter because the model is not significant
r^2 is how much variance the model is soaking up, rank does not predict much of gg rubbing intensity.
Because the F-statistic’s p-value is not significant, we can say that the rank of sender is not significantly explaining much of the variation of GG rubbing intensity (weight of the ties between individuals)
Does rank affect how many times bonobos received G-G rubbing from others (the weight of the edges they receive)?
lrqap2 <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving)) # list of predictor variables as network or matrix objects
summary(lrqap2)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.3992884 -0.3393159 -0.2434687 0.6007116 0.8021901
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.64715036 0.996 0.004 0.011
## x1 -0.07247427 0.054 0.946 0.096
##
## Residual standard error: 0.4679 on 40 degrees of freedom
## Multiple R-squared: 0.02423 Adjusted R-squared: -0.0001605
## F-statistic: 0.9934 on 1 and 40 degrees of freedom, p-value: 0.3249
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1
## Min -2.116679 -1.233718
## 1stQ -0.565970 -0.507257
## Median 0.084688 -0.003749
## Mean 0.061078 -0.023871
## 3rdQ 0.727731 0.410454
## Max 2.053671 1.347807
[DESCRIBE/INTERPRET OUTPUT] rank of receiver is even less predictive of intensity of GG rubbing, nd we know this based on […]
We can include multiple predictor variables! We can also use another network as a predictor variable! By doing so, we are conducting a multiple regression QAP. For example, we can ask: Do sender rank and grooming relationships predict the number of times bonobos G-G rubbed in the network?
mrqap <- netlm(bonobo_ggr_net, # response variable is a network object with a weighted adjacency matrix
list(rank_receiving, bonobo_groom_net)) # list of all predictor variables as network or matrix objects
summary(mrqap)
##
## OLS Network Model
##
## Residuals:
## 0% 25% 50% 75% 100%
## -0.4834824 -0.3406152 -0.2400259 0.5709808 0.8093067
##
## Coefficients:
## Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 0.63034548 0.992 0.008 0.014
## x1 -0.07893218 0.041 0.959 0.072
## x2 0.12308498 0.805 0.195 0.451
##
## Residual standard error: 0.4699 on 39 degrees of freedom
## Multiple R-squared: 0.04076 Adjusted R-squared: -0.008433
## F-statistic: 0.8286 on 2 and 39 degrees of freedom, p-value: 0.4442
##
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Coefficient Distribution Summary:
##
## (intercept) x1 x2
## Min -2.18779 -1.30301 -2.88601
## 1stQ -0.57197 -0.43084 -0.82754
## Median 0.03496 0.05656 -0.08460
## Mean 0.05476 0.03143 0.01229
## 3rdQ 0.69669 0.52417 0.66938
## Max 2.05025 1.53832 10.38170
[DESCRIBE/INTERPRET OUTPUT] not significant based on these values
Logistic regression QAPs test the extent to which independent
variables are affecting the presence or absence of edges. If you have a
network with binary ties, we can use this regression method to predict
ties in the outcome network (response variable)
Using our bonobo
dataset, we can ask: Does older age make an individual more or
less likely to G-G rub with other individuals?
In this
example, our GG rubbing network is the response variable while age is
our predictor variable.
If you imagine age (as the predictor variable) on the x axis, for every year a bonobo gets older, are they more likely to GG rub with more individuals? Will an older individual have more ties on the social network?
Similar to the multiple regression, we must first create a matrix that shows the rank of senders and receivers. As a reminder, senders are represented by ROWS while receivers are represented by COLUMNS in matrices
nodes <- 7 # number of nodes in the data set
age <- bonobo_ggr_net %v% "rank" # a vector of the node-level variable we're interested in
age_sending <- matrix(data = NA, nrow = nodes, ncol = nodes) #create empty matrix to be filled
age_receiving <- matrix(data = NA, nrow = nodes, ncol = nodes)
for (i in 1:nodes){ # for 1 through the number of nodes in the data set
age_sending[i,] <- rep(age[i], nodes)} # The age of each actor is repeated over entire ROW of matrix
age_sending
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [2,] 3.42 3.42 3.42 3.42 3.42 3.42 3.42
## [3,] 6.20 6.20 6.20 6.20 6.20 6.20 6.20
## [4,] 4.66 4.66 4.66 4.66 4.66 4.66 4.66
## [5,] 4.11 4.11 4.11 4.11 4.11 4.11 4.11
## [6,] 5.57 5.57 5.57 5.57 5.57 5.57 5.57
## [7,] 5.23 5.23 5.23 5.23 5.23 5.23 5.23
for (i in 1:nodes){
age_receiving[,i] <- rep(age[i], nodes)}
age_receiving
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [2,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [3,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [4,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [5,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [6,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
## [7,] 3.42 3.42 6.2 4.66 4.11 5.57 5.23
Now we can run our logistic regression! The netlogit() function performs a logistic regression of the network variable for the response variable (bonobo_ggr_net) on the network variables in the predictor variables (age_receiving and age_sending). The resulting fits and coefficients are tested against the null hypothesis.
logqap <- netlogit(bonobo_ggr_net, # response variable is a network object with an unweighted adjacency matrix
list(age_receiving, age_sending), # list of all predictor variables as network or matrix objects
reps = 1000) # number of draws for quantile estimation, 1000 reps is the default
summary(logqap)
##
## Network Logit Model
##
## Coefficients:
## Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
## (intercept) 4.5524763 94.8670398 0.978 0.022 0.049
## x1 -0.4903676 0.6124013 0.031 0.969 0.064
## x2 -0.6817886 0.5057117 0.009 0.991 0.019
##
## Goodness of Fit Statistics:
##
## Null deviance: 58.22436 on 42 degrees of freedom
## Residual deviance: 47.45396 on 39 degrees of freedom
## Chi-Squared test of fit improvement:
## 10.77041 on 3 degrees of freedom, p-value 0.01303442
## AIC: 53.45396 BIC: 58.66696
## Pseudo-R^2 Measures:
## (Dn-Dr)/(Dn-Dr+dfn): 0.2040994
## (Dn-Dr)/Dn: 0.1849811
## Contingency Table (predicted (rows) x actual (cols)):
##
## Actual
## Predicted 0 1
## 0 25 11
## 1 4 2
##
## Total Fraction Correct: 0.6428571
## Fraction Predicted 1s Correct: 0.3333333
## Fraction Predicted 0s Correct: 0.6944444
## False Negative Rate: 0.8461538
## False Positive Rate: 0.137931
##
## Test Diagnostics:
##
## Null Hypothesis: qap
## Replications: 1000
## Distribution Summary:
##
## (intercept) x1 x2
## Min -2.130605 -1.578398 -2.019697
## 1stQ -0.641907 -0.541719 -0.578321
## Median 0.050319 0.024150 -0.013246
## Mean 0.028454 0.024672 0.004196
## 3rdQ 0.707473 0.625960 0.556103
## Max 2.059569 1.679676 2.026379
As a reminder, we are testing whether age affects the likelihood of giving and receiving GG rubbing. In the output above, x1 represents the first variable in the model (age_receiving), and x2 represents the second variable in the model (age_sending). [DESCRIBE/INTERPRET OUTPUT]
https://stackoverflow.com/questions/40878163/interpreting-netlogit-output
[…]